Model Selection

Multimodal Conversion

# Multimodal Conversion

Index Anisora 5B Diffusers

An image-to-video generation model implemented with Diffusers, with a parameter scale of 5B

Hunyuanvideo I2V

Tencent's HunyuanVideo-I2V is a Diffusers-based image-to-video model capable of converting static images into dynamic videos.

hunyuanvideo-community

Minicpm O 2 6 GGUF

MiniCPM-o-2_6 is a multimodal conversion model supporting multiple languages and suitable for various tasks.

Text-to-Image Other

This is an image-to-text conversion model capable of processing both image and text inputs to generate corresponding text outputs.

Vit GPT2 Image Captioning Model

An image caption generation model based on the ViT-GPT2 architecture, capable of converting input images into descriptive text

Vchitect 2.0 2B

Vchitect-2.0 is a parallel Transformer model for scaling video diffusion models, specializing in text-to-video and image-to-video generation tasks.

Video Processing

This is a transformers-based image-to-text conversion model, specific functionalities require further details

4M 7 SR L CC12M

4M is a scalable multimodal masked modeling framework that supports any-to-any modality conversion, covering dozens of modalities and tasks.

Multimodal Fusion

Hashtaggenerater

Flickr30k is an English dataset for image-to-text tasks, commonly used for training and evaluating image caption generation models.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase